Univariate Plots Section

## [1] 113937     81

The Prosper loan dataset contains 81 variables, with almost 114000 observations. However, not all of the variables are valuable to explore. In the following analysis, I would focus on loan amount, loan original date, estimated return, current loan status, Prosper rating, Prosper score, listing category,borrower APR, borrower income, borrower employment status,borrower occupation, borrower home owner, borrower credit history, borrower state,borrower total Prosper loans.

## 'data.frame':    113937 obs. of  21 variables:
##  $ LoanOriginalAmount       : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate      : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ EstimatedReturn          : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ LoanStatus               : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ProsperRating..Alpha.    : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore             : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.: int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerAPR              : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ IncomeRange              : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ StatedMonthlyIncome      : num  3083 6125 2083 2875 9583 ...
##  $ DebtToIncomeRatio        : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ EmploymentStatus         : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ Occupation               : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ IsBorrowerHomeowner      : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CreditScoreRangeLower    : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper    : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ MonthlyLoanPayment       : num  330 319 123 321 564 ...
##  $ BorrowerState            : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ TotalProsperLoans        : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ LenderYield              : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  LoanOriginalAmount          LoanOriginationDate EstimatedReturn 
##  Min.   : 1000      2014-01-22 00:00:00:   491   Min.   :-0.183  
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   1st Qu.: 0.074  
##  Median : 6500      2014-02-19 00:00:00:   439   Median : 0.092  
##  Mean   : 8337      2013-10-16 00:00:00:   434   Mean   : 0.096  
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   3rd Qu.: 0.117  
##  Max.   :35000      2013-09-24 00:00:00:   316   Max.   : 0.284  
##                     (Other)            :111428   NA's   :29084   
##                  LoanStatus    ProsperRating..Alpha.  ProsperScore  
##  Current              :56576          :29084         Min.   : 1.00  
##  Completed            :38074   C      :18345         1st Qu.: 4.00  
##  Chargedoff           :11992   B      :15581         Median : 6.00  
##  Defaulted            : 5018   A      :14551         Mean   : 5.95  
##  Past Due (1-15 days) :  806   D      :14274         3rd Qu.: 8.00  
##  Past Due (31-60 days):  363   E      : 9795         Max.   :11.00  
##  (Other)              : 1108   (Other):12307         NA's   :29084  
##  ListingCategory..numeric.  BorrowerAPR              IncomeRange   
##  Min.   : 0.000            Min.   :0.00653   $25,000-49,999:32192  
##  1st Qu.: 1.000            1st Qu.:0.15629   $50,000-74,999:31050  
##  Median : 1.000            Median :0.20976   $100,000+     :17337  
##  Mean   : 2.774            Mean   :0.21883   $75,000-99,999:16916  
##  3rd Qu.: 3.000            3rd Qu.:0.28381   Not displayed : 7741  
##  Max.   :20.000            Max.   :0.51229   $1-24,999     : 7274  
##                            NA's   :25        (Other)       : 1427  
##  StatedMonthlyIncome DebtToIncomeRatio      EmploymentStatus
##  Min.   :      0     Min.   : 0.000    Employed     :67322  
##  1st Qu.:   3200     1st Qu.: 0.140    Full-time    :26355  
##  Median :   4667     Median : 0.220    Self-employed: 6134  
##  Mean   :   5608     Mean   : 0.276    Not available: 5347  
##  3rd Qu.:   6825     3rd Qu.: 0.320    Other        : 3806  
##  Max.   :1750003     Max.   :10.010                 : 2255  
##                      NA's   :8554      (Other)      : 2718  
##  EmploymentStatusDuration                    Occupation   
##  Min.   :  0.00           Other                   :28617  
##  1st Qu.: 26.00           Professional            :13628  
##  Median : 67.00           Computer Programmer     : 4478  
##  Mean   : 96.07           Executive               : 4311  
##  3rd Qu.:137.00           Teacher                 : 3759  
##  Max.   :755.00           Administrative Assistant: 3688  
##  NA's   :7625             (Other)                 :55456  
##  IsBorrowerHomeowner CreditScoreRangeLower CreditScoreRangeUpper
##  False:56459         Min.   :  0.0         Min.   : 19.0        
##  True :57478         1st Qu.:660.0         1st Qu.:679.0        
##                      Median :680.0         Median :699.0        
##                      Mean   :685.6         Mean   :704.6        
##                      3rd Qu.:720.0         3rd Qu.:739.0        
##                      Max.   :880.0         Max.   :899.0        
##                      NA's   :591           NA's   :591          
##  MonthlyLoanPayment BorrowerState   TotalProsperLoans  LenderYield     
##  Min.   :   0.0     CA     :14717   Min.   :0.00      Min.   :-0.0100  
##  1st Qu.: 131.6     TX     : 6842   1st Qu.:1.00      1st Qu.: 0.1242  
##  Median : 217.7     NY     : 6729   Median :1.00      Median : 0.1730  
##  Mean   : 272.5     FL     : 6720   Mean   :1.42      Mean   : 0.1827  
##  3rd Qu.: 371.6     IL     : 5921   3rd Qu.:2.00      3rd Qu.: 0.2400  
##  Max.   :2251.5            : 5515   Max.   :8.00      Max.   : 0.4925  
##                     (Other):67493   NA's   :91852

Univariate Analysis

What is the structure of your dataset?

What is/are the main feature(s) of interest in your dataset?

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Did you create any new variables from existing variables in the dataset?

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Bivariate Plots Section

## NULL

Pick those absolute correlation efficient higher than 0.4 to analyze

## Warning: Removed 1491 rows containing non-finite values (stat_smooth).
## Warning: Removed 3422 rows containing missing values (geom_point).

## Warning: Removed 947 rows containing non-finite values (stat_smooth).
## Warning: Removed 947 rows containing missing values (geom_point).

## Warning: Removed 753 rows containing missing values (geom_point).

## Warning: Removed 266 rows containing non-finite values (stat_smooth).
## Warning: Removed 266 rows containing missing values (geom_point).

Then we analyze factors effect

From the analysis above, we can find the key parameters are the borrower APR and the loan original amount, with whom other parameters have relation.

Then I would analyze other factors relation with them.

To simplify the analysis, I would use practical experience to reallocate those factors:

For loan status, we can set 4 case: In process contains current and Final payment in process; Past Due contains all Past Due; completed; Bad debt contains defaulted and chargedoff. However, since “completed” could used to be any other case, I would only analyze on the other.

For Loan Origination year, we can split it as pre 2009 and post 2009, due to the financial crisis.

Now make a pair comparement bewteen Loan original amount v.s. other factors

we can find all factors above affected OriginalAmount.

Since AA data peak at 15000, we would exclude to analyze again Since we have already analyze the relationship between BorrowerAPR and Prosper Rating before, we know focus on other two factors.

## [1] "BorrowerAPR" "status"      "year"

we can find all factors above affect BorrowerAPR.

# Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

What was the strongest relationship you found?

Multivariate Plots Section

## [1] -2286.053

The current and Finalpayment in process is in the lower region, the Past Due and Bad debt is in the upper region.

## [1] -3404.868

The Monthly Payment decided by loan original amount, interests rate and payment duration. The interests rate would be influenced by market, so there are 3 region in the map, indicate 3 different market rates. In each region, we can find better loan status will have a lower slope between loan orignal amount and monthly payment, which indicated a longer duration for higher loan status.

## [1] -331.8527

## [1] -9446.689

## [1] -198340.4

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Were there any interesting or surprising interactions between features?

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

Final Plots and Summary

Plot One

Description One

Plot Two

Description Two

Plot Three

Description Three

Reflection